/*==============================================================================
Flag and prepare region codes for merge for merge

Overview: 
Some NUTS are composites of NUTS regions, due to changes in NUTS regions over 
time.
Composite NUTS must be broken into their separate NUTS regions for merging with 
the WVS/EVS. 

This do file: 

I. breaks composite NUTS into separate NUTS regions
(a) merge nuts variable in employment data with list of NUTS regions. 
Note: non-merged values are composite regions or Swiss regions. 
(b) drop non-composite Swiss regions 
(c) separate out different composite NUTS codes exploiting "&" as deliminator. 
(d) count number of composite NUTS regions, store count in new variable 
composite_total.
(e) Create duplicates of composite regions where number of duplications is 
equal to composite_total. 
		The nuts region for each duplicate of the composite is renamed as one 
		of its constituent NUTS region. 
		The original composite NUTS region is stored in a new variable 
		composite_nuts.  


II. Classifies NUTS regions for which we have data based on NUTS level (i.e. 1-3)
(a) join with list of nuts regions, which contains variable nuts_level 

Note: After the match is made with EVS/WVS in step 0520, the answers to values 
questions will be collapsed back to the composite of regions using a population 
weighted mean. 

==============================================================================*/

clear all
set more off

*(a) merge nuts variable in employment data with list of NUTS regions. 
* Note: non-merged values are composite regions or Swiss regions. 
insheet using "$insheet_files/nuts_level_merge.csv" //this is a csv file containing all NUTS regions, specifying level 1-3. Downloaded from http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_CLS_DLD&StrNom=NUTS_33&StrLanguageCode=EN
sort nuts
save "$dta_files/nuts_level_merge.dta", replace

use nuts country using "$dta_files/step301_cleanup_ipolate.dta", clear

*(b) drop non-composite Swiss regions & U.S. & Canadian regions
drop if (country=="CH"&nuts!="CH021&CH025")|country=="US"|country=="CA"

sort nuts
merge nuts using "$dta_files/nuts_level_merge.dta", _merge(NL) 
keep if NL==1 //keep observations that did not merge with file containing NUTS regions. These are the composite NUTS groups.
keep nuts
duplicates drop //one observation for each composite NUTS region 

*(c) separate out different composite NUTS codes exploiting "&" as deliminator
rename nuts nuts_component
split nuts_component, p("&")

*(d) count number of composite NUTS regions, store count in new variable composite_total

forval x=1/2 {
gen nuts_composite`x'_flag=0
} 

local N = _N
forval x=1/2 {
	forval y=1/`N' {
		replace nuts_composite`x'_flag = 1 in `y' if nuts_component`x'[`y']!=""
	}
 }

egen composite_total= rsum(nuts_composite1_flag-nuts_composite2_flag)
drop nuts_composite1_flag-nuts_composite2_flag

*creates a list of composite NUTS regions, and list of all components that are 
* part of the composite NUTS regions
rename nuts_component nuts
sort nuts
reshape long nuts_component, i(nuts composite_total) 
drop if nuts_component==""
drop _j
by nuts: gen composite_count=_n
sort nuts composite_count
tempfile temp1
save `temp1'.dta, replace

keep nuts composite_total 
duplicates drop
sort nuts
tempfile temp2 
capture drop _merge
save `temp2'.dta, replace

*merge in composite region flag
use nuts country year using "$dta_files/step301_cleanup_ipolate.dta" , clear
duplicates drop nuts , force

sort nuts
capture drop _merge
merge m:1 nuts using `temp2'.dta 
gen nuts_composite_flag=(_merge==3)
drop _merge

*(e) Create duplicates of composite regions where number of duplications is equal to composite_total. 

local N = _N 
forval i = 1/`N' { 
quietly expand composite_total in `i' if nuts_composite_flag==1
}

*merge in individual components of composite NUTS regions
sort nuts year 
by nuts year: gen composite_count=_n if nuts_composite_flag==1
sort nuts composite_count
merge m:1 nuts composite_count using `temp1'.dta 

assert _merge==3 if nuts_composite_flag==1
drop _merge

sort nuts year composite_count

*The original composite NUTS region is stored in a new variable composite_nuts.  
gen composite_nuts=nuts if nuts_composite_flag==1

*Each duplicate is classified as one of the constituent NUTS region in variable nuts. 
replace nuts=nuts_component if nuts_composite_flag==1

/*******************************************************************************
1) classify NUTS regions based on their level (i.e. 1-3)
*******************************************************************************/

*(a) join with list of nuts regions, which contains variable nuts_level 
sort nuts
merge m:1 nuts using "$dta_files/nuts_level_merge.dta", gen(_merge_nuts_level)
drop if _merge_nuts_level==2 //not in our sample (extra regios, other countries)

*note: "nuts_level" variable now provides the nuts_level for each region we have data

sort nuts year 
save "$dta_files/step_0400_prepare_WVS_merge", replace



